Clean speech reconstruction from noisy mel-frequency cepstral coefficients using a sinusoidal model
نویسندگان
چکیده
This paper extends the technique of speech reconstruction from MFCCs by considering the effect of noisy speech. To reconstruct a clean speech signal from noise contaminated MFCCs an estimate of the clean mel-filterbank vector is required together with a robust estimate of the pitch. This work applies spectral subtraction to the mel-filterbank vector (derived from noisy MFCCs) to provide a clean speech spectral estimate. To obtain a reliable estimate of pitch a robust extraction technique is used. Spectrograms and informal listening tests reveal that a clean speech signal can be successfully reconstructed from the noisy MFCCs. Pitch errors are shown to manifest themselves as artificial sounding bursts in the reconstructed speech signal. Incorrect estimates of the spectral envelope introduce periods of noise into the reconstructed speech.
منابع مشابه
Robust algorithms for speech reconstruction on mobile devices
This thesis is concerned with reconstructing an intelligible time-domain speech signal from speech recognition features, such as Mel-frequency cepstral coefficients (MFCCs), in a distributed speech recognition(DSR) environment. The initial reconstruction methods in this thesis require, in addition to MFCC vectors, fundamental frequency and voicing information. In the later parts of the thesis t...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملThroat Microphone for Speaker Recognition Using AANN
In this paper, we have analyzed the performance of speaker recognition system based on features extracted from the speech recorded using throat microphone in clean and noisy environment. In general, clean speech performs better for speaker recognition system. Speaker recognition in noisy environment, using transducer held at the throat results in a signal that is clean even in noisy. This speak...
متن کاملOn compensating the Mel-frequency cepstral coefficients for noisy speech recognition
This paper describes a novel noise-robust automatic speech recognition (ASR) front-end that employs a combination of Mel-filterbank output compensation and cumulative distribution mapping of cepstral coefficients with truncated Gaussian distribution. Recognition experiments on the Aurora II connected digits database reveal that the proposed front-end achieves an average digit recognition accura...
متن کاملFeature Normalisation for Robust Speech Recognition
Speech recognition system performance degrades in noisy environments. If the acoustic models (HMMs) for speech are built using features of clean utterances, the features of a noisy test utterance would be acoustically mismatched with the trained model. This gives poor likelihood values and poor recognition accuracy. Model adaptation and feature normalisation are two broad areas that address thi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003